Discriminant Sub-Space Projection of Spectro-Temporal Speech Features Based on Maximizing Mutual Information
نویسندگان
چکیده
We previously developed noise robust Hierarchical SpectroTemporal (HIST) speech features. The learning of the features was performed in an unsupervised way with unlabeled speech data. In a final stage we deployed Principal Component Analysis (PCA) to reduce the feature dimensions and to diagonalize them. In this paper we investigate if a discriminant projection can further increase the performance. We maximize the mutual information between the features and the phoneme categories using a procedure known as Maximizing Renyi’s Mutual Information (MRMI) and also compare it to Linear Discriminant Analysis (LDA). Based on recognition tests in clean and in noise, i. e. in matching and mismatching conditions, we show that the discriminant projections increases recognition scores compared to PCA in matching conditions. However, this improvement does not transfer to the mismatching, i. e. noisy, conditions. We discuss measures to alleviate this problem. Overall MRMI performs better than LDA.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملMaximum conditional mutual information projection for speech recognition
Linear discriminant analysis (LDA) in its original modelfree formulation is best suited to classification problems with equal-covariance classes. Heteroscedastic discriminant analysis (HDA) removes this equal covariance constraint, and therefore is more suitable for automatic speech recognition (ASR) systems. However, maximizing HDA objective function does not correspond directly to minimizing ...
متن کاملCombining Feature Space Discriminative Training with Long-Term Spectro-Temporal Features for Noise-Robust Speech Recognition
Discriminative training of feature space using maximum mutual information (fMMI) objective function has been shown to yield remarkable accuracy improvements. For noisy environments, fMMI can be regarded as an effective noise compensation algorithm and can play a significant role for noise robustness. Feature space speaker adaptation techniques such as feature space maximum likelihood linear reg...
متن کاملOptimal feature sub-space selection based on discriminant analysis
The performance of a speech recogniser, or of any other pattern classifier, strongly depends on the input features: to obtain a good performance, the feature set needs to be both highly discriminative and compact. Linear discriminant analysis (LDA) is a common data-driven method used to find linear transformations that map large feature vectors onto smaller ones while retaining most of the disc...
متن کاملMinimum Classification Error Based Spectro-Temporal Feature Extraction for Robust Audio Classification
Mel-frequency cepstral coefficients (MFCCs) are the most popular features for automatic audio classification (AAC). However, MFCCs are often not robust in adverse environment. In this paper, a minimum classification error (MCE)-based method is proposed to extract new and robust spectro-temporal features as alternatives to MFCCs. The robustness of the proposed new features is evaluated on noisy ...
متن کامل